Configuring Amazon Redshift as a target

You can set up your Amazon Redshift cluster by acquiring the essentials needed to connect with Data Integration.

Prerequisites

Ensure to meet the following prerequisites:

Select a source

Setting up a target requires selecting a Source and creating a connection to it.
Amazon Redshift connection

Ensure that you have established a connection with Amazon Redshift as your Target.

Procedure

Click the curved arrow next to Schema on the right side of the row. After the refresh, click the row and choose the Schema to store the data.
Enter the Table Name.
Select a Distribution Method (This option is available only when you select Custom Report, or Custom Query as the Data Flow Mode):
- All - Involves distributing a complete copy of the table to every node in the cluster. Best suited for tables that are rarely updated or are static (slow-moving). It offers little benefit for small tables, as query redistribution costs remain low.
- Even - Involves the leader node distributing data rows across slices in a round-robin manner, without considering the values in any specific column. This method is ideal for tables that are not involved in joins. It also suits cases where neither KEY nor ALL distribution methods offer an advantage.
- Key - Allocates rows based on the values in a designated column. The leader node places rows with matching values in the same node slice. This approach works well when you distribute two tables on their joining keys, storing rows with matching values in the common columns together to enable efficient join operations.
  When using the Key Distribution Method, you must choose a single key column to perform the slicing.

note

You cannot alter sort keys. To change them, you must either recreate the target table manually or use Overwrite mode in Data Integration.

Procedure

When using Multi-Tables or Predefined Reports as your Data Flow Mode, you can choose a Distribution Method in the Table Settings, which is accessible by clicking on a particular table in the Schema tab.

Set the Loading mode.

note

Amazon Redshift offers the flexibility to choose different Merge methods. For more information about these options, refer to the Amazon Redshift Upsert-Merge Loading Mode Options topic.

In the Additional Options menu, the following options are available:
- Truncate Columns - designed to handle instances where an array’s length exceeds the VARCHAR limit in Redshift. Since Redshift's array type has limited flexibility and can change between different data loads, Data Integration ensures compatibility by converting arrays into VARCHAR(max) type. If an array surpasses the VARCHAR length, use the Truncate Columns option is available under Advanced Options in the Target tab to truncate the array data to fit within Redshift's size constraints.
- Compression Update - updates the column compression in the target table during data loading. This action occurs only if there is no target table created.
- Keep Schema-binding Views - ensures that all schema-binding views remain intact when employing upsert-merge or overwrite methods. If not selected, the platform drops any schema-binding views that depend on the target table.
- Add Data Integration Metadata - designed to enhance the Target table by automatically including three columns: last_update, river_id, and run_id. This feature also lets you incorporate additional metadata fields using expressions.

note

When the Source is in Multi Table mode, this option becomes available.

If you have configured a Custom File Zone, select a Bucket and specify a path to store your data. You can also establish a time frame for period partitioning within a FileZone folder.

note

You can enable Data Integration to divide the data according to the data insertion day, the Day/Hour, or the Day/Hour/Minute. Data Integration produces data files from your sources under folders that correspond to the correct partition selected.

Any can send data to your Amazon Redshift Bucket.

Prerequisites

Select a source

Amazon Redshift connection

Procedure